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ABSTRACT 



Given a set of objects (A, B, C, . . . ), each described by a 
set of attribute values, and given a classification of these 
objects into categories, a similarity function accounts well 
for this classification when only a small number of objects 
are not correctly classified. A method for modelling a 
sunilarity function using a neural network comprises the 
steps of: (a) inputting feamre vectors to a raw input stage of 
a neural netwoik respectively for object S in the given 
category, for other objects G in the same category being 
compared the object S, and for object B outside the given 
category; (b) coupling the raw inputs of feature vectors for 
S, G, and B to an input layer of the neural network 
pcrfonmng respective set operations required for the simi- 
larity function so as to have a property of monotonicity; (c) 
coupling the input elements of the input layer to respective 
processing elements of an hidden layer of the neural network 
for computing similarity function results adaptively with 
different values of a coeflacient w of the similarity function; 
(d) coupling the processing elements of the hidden layer to 
respective output elements of an output layer of the neural 
netwoik for providing respective outputs of an error function 
measuring the extent to which object S is more similar to 
object G than to object B; and (e) obtaining an optimal 
coefficient w by back propagation tiirough the neural net- 
work which minimizes die error outputs of the error func- 
tion. 



12 Claims, 4 Drawing Sheets 
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METHOD FOR MODELLING SIMILARITY 
FUNCTION USING NEURAL NETWORK 



CROSS-REFERENCE TO RELATED 
APPUCAnON 

This is a continuation of Scr. No. 07/698,646 filed May 
10, 1991, now abandoned 

This patent s^lication is idaied to copending U.S. patent 
application Sec: No. 08/062.481 of Schwanke et al., entided 
"Mediod For Software Stmcturc Analysis Using Conceptual 
Gusteiingf', filed on May 17, 1993. and to U.S. patent 
i^lication Sen No. 06/241,278 of the same inventors in the 
present invention, entided "Mediod For Estimating Similar- 
ity Function Coefficients From Object Classification Data", 
filed on Nov. 16. 1994. 

The following related patent applications are being filed 
on even date herewith in the name of Robert W. Schwanke, 20 
one of the present inventors. The disclosed subject matter 
thereof is herein incorporated by reference. Tht ^jplication 
entitled A FEATURE RATIO METHOD FOR COMPUT- 
ING SOFTWARE SIMILARny discloses a method for 
computing the similarity between first and second software 35 
objects. The plication entitled AN INTERACnVE 
METHOD OF USING A GROUP SIMILARITY MEA- 
SURE FOR PROVIDING A DECISION ON WHICH 
C910UFS TO COMBINE disdoses a method of using a 
group similarity measure, with an analyst, on a s^ contain- 30 
ing a plurality of groups, die groups containing software, 
objects, for providiog a decision on which groups to com- 
bine. The application entitled A METHOD FOR COMPUT- 
ING THE SIMILARITY BETWEEN TWO GROUPS OF 
OBJECTS discloses a method for computing the similarity 35 
between two groups of objects wherein the similarity 
between any pair of objects can be computed by a similarity 
function, the metiiod being for use in software clustering. 
Hie application entided A TWO-NEIGHBORHOOD 
METHOD FOR COMPUTING THE SIMILARITY 40 
BETWEEN TWO GROUPS OF OBJECTS discloses 
another method for computing the similarity between two 
groups of objects wherein the similarity between any pair of 
objects can be computed by a similarity function, the method 
being for use in software clustering. The application entided 45 
A METHOD FOR ADAPTING A SIMILARITY FUNC- 
TION FOR IDENTIFYING MISCLASSIFED SOFT- 
WARE OBJECTS discloses a method for providing initial 
estimates for die weights and coefScients of a similarity 
function, using them to identify an initial maverick list, 5Q 
removing the mavericks firom dieir assigned groups, and 
then outputing the modified groups, using only qualified 
data for tuning the similarity function. The application 
entided AMETHOD OF IDENTIFYING MISCLASSIFIED 
SOFIWARE OBJECTS discloses a inediod for identifying 55 
software objects that have been assigned to a wrong group, 
wherein the similarity between objects is known, such as by 
evaluating a similarity function. 



60 



The present invention relates to a method for evaluating 
the classification of objects into categories, and particulariy, 
to one for estimating coefficients for a similarity function 
usable for classification. The method has particular applica- 65 
tion for automated analysis of the composition structure of 
a laige software program. 



BACKGROUND OF INVENTION 

A composition structure for a large software program is an 
"organization chart" that groups procedures into modules, 
modules into subsystems, subsystems into bigger sub- 
systems, and so on. The composition structure chart is used 
for project management, integration planning, design, 
impact analysis, and almost every other part of software 
development and maintenance. In a well-designed system, 
each module or subsystem contains a set of software units 
(procedures, data structures, ^pes, modules, subsystems] 
that collectivdy serve a common purpose m the system. 
However, because the purposes and roles of software units 
firequendy overlap, it is not always easy to decide how a 
system should be divided up. Furthermore, during the evo> 
lution of a large software system over several years, many 
software units are added, deleted, and changed. The result- 
ing organization chart may no long^ have any technical 
rationale behind it, but may instead be die result of economic 
expediency, or simple neglect Its poor quality dien mcreases 
the cost of software maintenance, by impeding technical 
analysis. 

Generally, when a software system diat has been devel- 
oped by a large team of programmcts has matured over 
several years, changes to die code may introduce unexpected 
interactions between diverse parts of the systenL This can 
occur because the system has becon% too huge for one 
person to fully understand, and the original design docu- 
mentation has become obsolete as the system has evolved. 
Symptoms of structural problems hiclude too many unnec- 
essary recompilatioQS, unintended cyclic dependency 
chains, and some types of difficulties with understanding, 
modifying, and testing the system. Most stmctural problems 
carmot be solved by making a few "small" changes, and 
most require the programmer to understand the overall 
pattern of interactions in order to solve the problem, 

A field of application of die present invention is in the 
implementation of a software architect's "assistant" for a 0 
software maintenance mviiomncnt The "assistant" is a 
computer program for helping the software architect to 
analyze the stmcture of the system, specify an architecture 
or chart for it, and determine whether the actual software is 
consistent with the specification. Since the system's stmc- 
tural architecture may never have been formally specified, it 
is also desirable to be able to "discover" the architecture by 
automatically analyzing the existing source code. It slK)uld 
also be possible to critique an architecture by comparing it 
to the existing code and suggesting chaises that would 
produce a more modular specificatioiL 

A common approach to stmctural analysis is to treat 
cross-reference information as a graph in which software 
units appear as nodes and cross-refermces appear as edges. 
Various methods, both manual and automatic, may then be 
used to analyze the graph. Recent work has used clustering 
methods to summarize cross-reference graphs, by clustering 
nodes into groups, and then analyzing edges between 
groups. See, e.g., Yoelle S. Maarek and Gail E. Kaiser, 
"Change Management in Very Large Software Systems", 
Phoenix Conference on Computer Systems and Communi- 
cations, IEEE. March 1988, pp. 280-285, and Richard W. 
Selby and Victor R. Basili, **Error Localization During 
Software Maintenance: Generating Hierarchical System 
Descriptions firom the Source Code Alone", Conference on 
Software Maintenance— 1988, IEEE, Oct 1988. Odier cur- 
ready available methods for recovering, restrucmring, or 
improving the composition stmcture chart are "manual", 
involving much reading and trial and error. 
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austering algorithms may be either batch or incremental. 
A batch algorithm looks at all of the data on all objects 
before beginning to duster any of thenL An incremental 
algorithm typically looks at one object at time, clustering it 
with the objects it has already looked at before looking at the 3 
next object The heart of a batch algorithm is the similarity 
measure, which is a function that measures how ''similat^ 
two groups of objects are (each group can have one or more 
members). The batch algorithm takes a large set of indi- 
vidual objects and places them together in groups, by lO 
repeatedly finding the two most similar objects (or groups of 
objects) and putting them together. The batch algorithm 
typically produces groups with two subgroups. This is 
umuitural for most purposes; instead it is preferable to merge 
some sub>groups to inake larger groups. 15 

Prior art applications of clustering to software analysis 
have generally fallen into two categories. One category is 
conceptual clustering for le-use, as discussed in Maarek and 
Kaiser, referred to above. This work finds a way to specify 
the external interface of a software unit, including its func- ^ 
lion, and then classify units drawn from many different 
system to place them in a library whore they can be found 
and re-used, they cannot use shared names for classification 
because two similar units drawn from different systems 
would use different names. 25 

Another category is statistical clustering which attempts 
to predict errors and predict the impact of changes, as 
discussed in Sclby and Basili, referred to above. This work 
classifies the software units according to the number of 
"cormections'* between them, which may be procedure calls, ^ 
data flow paths, <k names used in one group that are the 
names of units in the other group. The resulting groups can 
be used to plan integration sequences for large software 
systems, and can be measured to predict the likelihood of 
errors in them. However, the groups do not have lists of 
shared characteristics thai would explain to the programme 
why they were grouped together. There is no evidence yet 
that the groups confuted this way would be appropriate for 
describing the structure of the whole system. 

It is a therefore a general object of the present iirvention 
to automate the task of analyzing the conqiosition structure 
of a large software program by using computerized cluster- 
ing methods for grouiring objects into groups according to 
similar attributes. It is a particular object of the invention to 
provide f e e dbac k on classification decisions that can lead to 
improved classification. Specifically, it is desired to provide 
a method for estimating the optimal coeffuaents for a simi- 
larity function which accounts well for the dassificadon of 
objects in a category. 
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SUMMARY OF INVENTION 



Concepmal dustering methods are ^plied to computer- 
ized software stracture analysis. Objects are classified for 55 
inclusion in a group, module, or subsystem by measuring the 
extent to whidi its members use the same so^ware units, i.e., 
clustering by NAMES-USED, and the extent to which die 
members of a subsystem are used together by other software 
units, i.e., clustering by USER-NAMES. Tlie objects arc ^ 
des(^bed by a pair of measured attribute values for 
NAMES-USED and USER-NAMES. 

In the present invention, it is recognized that providing 
feedback on dassificadon decisions can lead to improved 
classification. Given a set of objects (A, B, C, . . . ), each 65 
described by a set of attribute-value pairs, arui given a 
classification of these objects into categories, it is desireable 



to find a similarity function which accounts well for this 
classification. A similarity function is used to test the clas- 
sification of an object S cornputing the similarity SIM(S, 
N) of the object S to each other dassified object N, and 
identifying the k most-similar objects. An object S is cor- 
rectly classified when it is already in the same class as the 
majority of the k objects most similar to it The similarity 
function accounts well for a classification when only a small 
number of objects are not conecdy dassified. This is 
obtained when coeffidents are fourid for the similari^ 
function which result in an error rate that is considered to be 
an acceptable level 

In accordance with the present invention, a method for 
modelling a similarity function using a neural netwodc 
comprises the stsps of: 

(a) inputting feature vectors to a raw input stage of a 
neural network respectively for objea S in a given category, 
for other object G in the same category being compared the 
object S, and for object B outside the given category; 

(b) coupling the raw inputs of feature vectors for S, G. and 
B to a set of input elements in an input layer of the netiral 
network for performing respective set operations required 
for the similarity function (SIM) providing a property of 
monotonidty; 

(c) coupling the input dements of the input layer to 
respective processing dements of an hidden layer of the 
neural network for computing similarity function results 
adaptively with different values of coe£5cients w of the 
similBrity function; 

(d) coupling the processing elements of the hidden layer 
to respective ou^t elements of an output layer of the neural 
network for providir^ respective outputs of an error function 
measuring the extent to which object S is more similar to 
object G than to object B; and 

(e) obtaining an optimal coeffident w by back prop^- 
tion through the neural network which xmmmizes the error 
outputs of the error function. 

The above-described estimation method finds the simi- 
larity function which is an approximation of an '^ideaT* 
similarity function for classifying objects in the given cat- 
egory. The optimal similarity function may be used in 
conjimction with clustering methods for dtiicA classifica- 
tion. 



BRIEF DESCRIPTION OF DRAWINGS 

The above objects and further features and advantages of 
the invention are described in detail bdow in conjunction 
with the drawings, of which: 

FIGS. 1, 2, and 3 are call graphs comparing the use of 
conceptual clustering methods to statistical clustering meth- 
ods for the classification of objects in software analysis; 

FIG. 4 is a block diagram of the steps in an automated 
software analysis program for dassifying software objects 
into categories according to measured ^lared-name attribute 
vahies and generating a conq>osition chart of die categorized 
objects; 

HO. 5 is a schematic drawing showing illustrating the 
problem of deriving a similarity function which accounts 
well fior a given object dassification; 

FIG. 6 is a schematic diagram of the application of a 
neural network for estimating the coeffidents for the simi- 
larity function through feed-forward, back propagation. 
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DETAILED DESCRIPTION OF PREFERRED 
EMBODIMENTS 

A conceptual cluster is a group of entities that have 
common properties which can describe a unifying concept 
An initial stsp is to find the common properties of units of ^ 
software, and then to use them to place the units in groups. 
These groups, and groups of groups, form the composition 
structure chart, which is tenned a "classification tree" or 
"concept tree". 

Concepmal clustering is a special case of statistical clus- 
tering. In the general case, any kind of information about 
objects can be used to cluster them, and there need not be 
any list of common properties. Conceptual clustering can be 
used when each individual object has similar characteristics 
. or attributes, and the objects are grouped according to &ese 
attributes. 

When the units of a subsystem share a common purpose, 
it is often reflected in a set of variables^ procedures, and data 
types that they use. These may actually be hidden within the 
subsystem, where no other units may use them, or they may ^ 
simply be used intensively within the subsystem, and only 
rarely outside it Onemeasure of the quality of a subsystem 
is the extent to which its members use the same software 
units. Another reflection of a common purpose is the extent 
to which the members of a subsystem are used together by ^ 
other software units. 

It is herein recognized that the names of (non-local) 
variables, procedures, macros, and types used by a software 
unit are important characteristics of the unit, and may be ^ 
used effectively to cluster it, hereinafter referred to as the 
NAMES-USED.. Furthermore, the names of other units, in 
which a given unit's name is used, may also be used 
effectively to cluster a unit and are hereinafter refored to as 
the USER-NAMES. 

35 

Thus, the NAMES-USED and USER-NAMES can be 
treated as **features" or "attributes" of the software unit 
Doing so allows the use of similarity measures based on 
these shared features. While the Uterature relating to clas- 
sification noraially uses dissimilarity measures, the more 40 
intuitive term "similarity measure" is preferred herein. The 
use of similarity measures in turn allows the use of auto- 
mated "conccpUial clustering" methods, which were origi- 
nally developed for classification, pattern recognition, infor- 
mation retrieval, and machine learning, and are ^pUed 45 
herein to software analysis. 

The application of clustering in the present invention 
differs considerably from prior ait sqppHcatiQns. In the 
present invention, a goal of conceptual clustering is to look 
for shared implementation concepts, as reflected in shared 50 
names. Prior systems, on the: other hand,, look either for 
similar spedficadons, as in software re-use libraries, or for 
operational subsystems, as in software emt prediction met- 
rics. In the present invention, similar imits grouped together 
are all drawn finm the same software system, whereas prior ss 
art.clustering for re-use classifies units drawn from different 
systems. In statistical clustering applications, the similarity 
measures are designed in isolation, rather than being derived 
from a category utility measure. This means that the quality 
of the overall tree structure is not expliciUy considered in the eo 
design of the similarity measure. In most published software 
clustering experiments, similarity has been based on the 
number of **conaections" between software units. The con- 
nections may be names used, procedures called, data depen- 
dency paths, : or other kinds of connections. Hie drawback of 65 
this approach is that two software units that are not con- 
nected to each other are unlikely to be placed in the same 
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groups even if they perform similar functions for many of die 
same "clients" in the software system. 

In the presem invention, a similarity function based on the 
shared characteristics NAMES-USED and USER-NAMES, 
rather than on connections between the compared groups, is 
found to be more effective at discovering objects that are 
similar in design, especially when those objects have few 
direct connections between them. For example, the SINE 
and COSINE routines are common routines in a mathemati- 
cal software tibrary. One would not expect that either one of 
them will actually call, or pass data to, the other. On the 
other hand, one could expect that many of the software, 
modules that call the SINE routine would also call the 
COSINE routine, and vice versa. This situation is portrayed 
in a hypothetical call gr^h shown in FIG. 1. 

A similarity measure based on the prior art method of 
measuring connection strength might determine that SINE is 
more similar to A, B, or C than it is to COSINE. Clusteiing 
the two most similar nodes might produce the graph in FIG. 
2, showing modulo, permutations of (A, B. C) and (SINE, 
COSINE), which is not an acceptable result. Ratiier, what is 
required is a similarity measure that recognizes the parallel 
structure apparent in FIG. 2. 

The similarity measures for NAMES-USED and USER- 
NAMES employed in the present conceptual clustering 
methods, i.e,, those based on "shared names" do this very 
well. In FIG. 1, both SINE and COSINE share USER- 
NAMES A, B, and C. Conversely, A, B, and C all share the 
NAMES-USED SINE and COSINE. Qustering tiie two 
nodes that share the most common names produces the 
gra|^ in FIG. 3 which represents the correct groupings. 

Another clustering analysis method in the present inven- 
tion represents edges in a gn^h as "features" of the nodes 
they connect, and measures similarity of nodes by looking at 
which features two nodes have in common, and which are 
diffierent In software engineering graphs, nodes (A, B, C, . 
. . ) are represented by objects of the same names, and edges 
(X,Y) are represented by giving the object X a feature #Y, 
and giving tiie object Y a feature &X, where the name 
convention of &M and #M are used to represent an object's 
predecessors and successors in the graph, respectively. 
These graphs represent cross-reference information. The set 
of feaUircs (#M) of object X represent the non-local names 
that it uses (its NAMES-USED), and the set of feamrcs 
(&M) represent the names of other software units that use 
the name X (its USER-NAMES). In die previous example, 
the SINE and COSINE routines both have features (&A, 
&B, &C), and the routines A, B, and C have features 
(#SINE, #COSINE). 

By representing the cross-references as features, we can 
easily identify names that are common to (shared by) two 
nodes by con:q)aring their feature lists. Useful similarity 
measures can be defined by counting the number of shared 
names, non-shared names, or both. Aggregate measures can 
also be defined for measuring similarity between two groups 
of objects, by looking at the frequency with which common 
features occur in the groups. Shared-name similarity mea- 
sures do not replace or subsume coimection strength mea- 
sures. If two nodes are cormected, but have no shared names, 
the similarity between them will be zero. However, it is 
recognized that composite similarity measures based on both 
coimection strength and shared names may be useful. 

Even where it is not known exactiy what information a 
name represents, seeing the same name occurring in two 
software units suggests that their implementations are 
related. This may be due to a shared variable, macro, type. 
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or procedure. In each case» it means that they both rely on 
the functional specification of the shared name. If a group of 
software units share a set of data types, vaiiables, macros, 
and/or procedures which few other units use, the group 
should be considered as a potential module. 5 

In FIG, 4, a block diagram illustrates the basic steps in the 
application of conceptual clustering methods as applied to 
software analysis using the above-described attributes of 
NAMES-USED and USER-NAMES. In the first block 10. 
the source code of the subject software program is scanned 
usmg, for example, a text string search. Objects such as files, 
procedures, routines, modules, and/or subsystems are iden- 
tified at block 11, and the names of routines called or 
referenced by the objects, i.e., NAMES-USED, as well as 
the names of routines calling or referencing the objects, are 
extracted at blocks 12 and 13. The shared features of the 
objects, i.e., shared NAMES-USED and USER-NAMES, 
are compared at block 14, and a pair of attribute values are 
assigned at block 15 based upon measures of the extent of 
shared names. At block 16, the objects are compared by 
graphing them using their attribute values for coordinates on ^ 
the gr^h axes. At block 17, the objects ace classified into 
categories, by human expertise throu^ visual analysis of 
the graph, and/or by computing similariQr measures usii^ a 
similarity function for calculating their "doseness", e.g., 
Euclidemi distance, to neigihboiing objects. Hie similarity ^ 
function is desraibed in greater detail below in conjunction 
with a coefiSdent estimation method of the present inven- 
tion. When the objects have been dassified into categories, 
a charting routine at block 17 is used to generate a tree 
composition chart using conceptual clustering algorithms. ^ 

In copending U.S. patent application Sen No. 07/525,376 
of Schwanke et al.. entitled **Method For Software Stracturc 
Analysis Using Conceptual austering", filed on May 17, 
1990, various conceptual clustering methods based upon ^ 
shared-name features are disclosed. Hierarchical ascencHng 
dassification (HAC) algorithms are described for forming 
clusters (dasses) of objects in a bottom-up j^hion, by 
forming small dusters of closely-related or very similar 
d}jects, then combining the small clusters into larger clus- ^ 
ters, then finally fonning a classification tree whose leaves 
are the original objects, and whose interior nodes are the 
dasses. Contrarily, partitioning algorithms are used to divide 
a set of dasses into two or more dasses, then recursively 
divide each dass. A ''massage" recursive procedure is used ^ 
to increase the average brandling fector of a tree by dimi- 
nating low-utility interior nodes. Utflity is measured by a 
category utility function CUCD which is the product of the 
size and "purity" of a category. 'Turity" is measured as the 
sum of squares of feature frequendes, i.e., of the probability ^ 
that a member of a category has a given feature. Reference 
is made to the parent applicaticm foe a more detailed descrip- 
tion of tiiese object classification methods. 

The above-desoibed software analysis techniques dan be 
incorporated in a software architect assistant program. An 55 
aspect of the present invention relates to a gmphical and 
textual "structure chart editor^* f<»^ maintaining large soft- 
ware systems, hMcinafter referred to as "ARCH", The 
ARCH program includes a feature extracting tool for 
extracting cross-reference features from existing source 
code, and an editor for browsing the cross-reference infor- 
mation, editing a subsystem composition tree (which may be 
initially input based upon an existing composition spedfi- 
cation), automatically dusteiing procedures into moddes 
and modules into subsystems, and identifying strong con- ^5 
nections between procedures located in diff^ent modules. 

In the present invration, it is recognized that providing 
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feedback on dassification dedsions can lead to improved 
classification. Given a set of objects (A, B, C • . . ). each 
described by a set of attribute-value pairs, and given a 
dassification of these objects mto categories, it is desireable 
to find a similarity function which accounts well for this 
classification. If a similarity function is found to work well, 
it can be relied upon for classifying or confirming the 
classification of an object in that category. 

HG. 5 illustrates this problem for a very simple case. 
Ead! point in the scattetgram plots an observed object, as 
defined by a pair of real numbers, such as its measured 
attribute values. The objects have been classified mto Cat- 
egory 1, Category 2, and Category 3 by an analyst or by an 
analysis program viewing the plotted data and identifying 
clusters. 

A similarity function is used to classify an object S by 
conopiting the similarity SIM„,(S,N) of die objed S to each 
other dassified object N, identifying the k most-shnilar 
objects, determining which category contains the majority of 
these objects, and placing S in the same class. If no class 
contains a majority, the object S cannot be classified. An 
object S is deemed conectiy classified when it is already in 
the same dass as the majority of the k objects most similar 
to it 

A similarify function accounts for a classification when 
there exists a value k such that each objed S is correctiy 
classified. A similarity function accounts well for a classi- 
fication when only a small number of objects are not 
correctiy classified. A similarity function is an approxima- 
tion of an "ideal" sindlarity measure when coeffidents w are 
estimated for the function which results in an error rate that 
is considered to be an acceptable level. 

The present invention provides a method using a feed- 
forward, back propagation neural network for estimating the 
optimal coeffident for the similarity fiinction to make it 
account well for a givra set of objects and classification of 
those objects into categories. Each object is described by a 
tuple of values <c, vl . . . vn>, where c is the object's 
assigned category, and vl ... vn are die values of n attributes 
selected to describe the object 

For example, the similarity function may be related to the 
Euclidean distance between objects described by a pair of 
attribute values, which are plotted as points on assiuned X 
and Y ajces. Such a function can be expressed as: 

™j<»l»yl>v<x2j(2>)=l/lV*W!-ac iy»+(y2-i?iy»10J 

The coeffident w in the formula represents the rdative scale 
of units along the X and Y axes. Varying this parameter will 
change not only the absdute distances, but also the rank 
order of distances between pairs of points. It is desireable to 
find the best value for w such that, in most cases, the 
similarity measure for a point indicates that it should be in 
the same category as the points nearest (most similar) to it 
For exan^le, the two points G6 and B3 (circled in the 
diagram) may be more similar to the majority of points in 
Categories 2 or 3, dependii^ upon how die value of w of die 
similarity function stretches or contracts their relative dis- 
tances along the X and Y axes. 

A neural network performs a napping firom input data to 
ouq>ut data through a hidden layer. Each layer is comprised 
of one or more processing elements which poforms an I/O 
or given transfer function. A back propagation neural net- 
work is designed to "leam" or "adapt" to perform tiie 
moping by being given modelling examples of expected 
inputs and outputs. A back propagation neural network 
aigoritiun is fiilly described in the reference text "Paralld 
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Distributed Processing, The Microstructure of Cognition". 
Rumelhart, Hinton & Williams, MIT Press, 1986. In a 
typical back propagation system, there are directed connec- 
tions from the processing elements in the input layer to the 
hidden layer, and from the hidden layer to the output layer. 
There are no oomiectioQS between processing elements in 
the same layer, nor ooss-connections from the ou^ut layer 
to the hidden layer or from the hidden layer to the input 
layer, i.e., there are no cycles Qoops) in the connection 
aiiaugemenL 

Learning similarity function coefficients from a table of 
similarity value orderings is related to a general method 
called multidimensional scaling (MDS), which is well 
described in Multidimensional Scaling, by Shepard, Rom- 
ney, and Ncrlove, 1972. However, MDS generally requires 
that almost every entity be compared to almost every other 
entity, and that the similarities be rank-ordered. Learning an 
ordinal measure with a neural network, by measuring two 
quantities whose order is known and comparing the results, 
has been described by Tcsauro in the context of the Neuro- 
ganunon Program for evaluating backganmion game moves. 

In the present invention, the "ideal" similarity functions 
are foimd by using a feed-forward, back propagation neural 
network to estimate the function wd^iring coefBcients w. 
The basis of this method is a continuous approximation to 
the error function ERR, as follows: 

where threshold is typicaUy 0.95, and 
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Thus, the error function is close to 0 when S is significantly 
more similar to G, the "good neighbor", than to B, the "bad 35 
neighbor*', and close to (0.95)^ when the opposite is true. 



The optimal coefficient estimation method tries to find a 
value for w that minimizes the number of times 

ERR(SIM^<S,G3>)i$ significantly greater than 0. ^ 

The above-described error funcdon has a continuous first ^0 provides for restrictmg the class of similarity functions to 



minimized. However, the Inner Product measure is not 
accurate enouigh a measure for modelling the desired simi- 
larity functions, and therefore, the more complex functions 
described below are the ones prefored for modelling. 

An accurate estimation of similarity funcdon coefficients 
is obtained by using monotonic similarity functions. A 
general description of monotonic similarity functions is 
provided in "Features of Similarity" by Amos Tversky, in 
Psychology Review, vol. 84, no. 4, July 1977. Let all 
attributes of objects A, B, C ... be boolean, meaning that 
each attribute can only have the value true of false. Each 
object can then be rq)resented equivalently as a set of tme 
attributes. Let a, b, c ... be the sets of attribute names for 
which objects A, B, C . . . , respectively, have true values. 
The similarity function SIM^ is monoUsnic ifr 

51M,^A,B)>SIM^(A,Q 

whenever the following set relationships hold: 
anb^anc, 
a-coar-b, and 
o-aob-a, 

where "n" represents a set intenection, represents a set 
union, "-" represents set subtraction (the set of values that 
are in set-a but not in set-b), ^'d" represents "includes", i.e., 
set-a includes set«b or set-b is a subset of set-a, and "d" 
represents "proper inclusion", i.e., set-a has at least one 
element that set-b does not. Furthermore, the above inequal- 
ity is strict whenever at least one of the set inclusions is 
proper. 

When the similarity function SIM has the monotonicity 
property, the coefficient estimation method can use mono- 
tonic exclusion to exclude unwanted data from the training 
set That is, object S is considered monotonically more 
similar to B than to G when attribute sets s, b, and g satisfy 
the inclusion conditions of the monotonicity property such 
that S will always be measured by the similarity ftmction as 
more similar to B than to G regardless of the values of the 
coefficient w. 

A special variation of the monotonic-exclusioQ method 



derivative, making it suitable for solution by gradient 
descent. To iexfdain how gradient descent works, an example 
is given for using it on a very simple similarity measure, i.e., 
the Iimer Product measure routinely used in information 
retrieval. Ihe Inner Product measure takes the inner product 45 
of two feature vectors, the result of which is AnB. Its value 
depends only on shared features, not on non-shared (dis- 
tinctive) features. Let IP(A3) be the inner product measure, 
which is computed by attaching , a weight to each of the 
possible shared names, and tbea sununing the weights so 
associated with names common to A and B: 



those functions related to the Tversky Model of human, 
similarity judgment, which are of the form: 

Such functions embody the matching property. To ensure 
monotonicity, the following two functioi^ forms CON- 
TRAST or RATIO may be used: 



Kx) 



n^AnB 



55 



The optimal weights would then be 'learned" by evaluating 
ERR(IP,<S,G3>) for each data tuple, and using gradient 
descent to select weights that would reduce ERR. In par- 
ticular, the update rule for computing a weight at iteration 
t-f 1 from its previous value at t is: 



60 



The gradient descent cnethod repeatedly evaluates the error 65 
function on a data sanq>le and updates the weights, by the 
above rule, until the average error over all data items is 



If the imderlying function f is monotonic by set inclusion, 
Le., if xcy implies f(x)<f(y), then both the CONTRAST and 
RATIO functions will be monotonic, matching similarity 
functions. 

A further modification uses a similarity function that takes 
into account whether or not the compared objects are 
"linked". Linked(A3) is a function whose value is 0 or 1 for 
every pair of inputs, and whose value is known for every pair 
of objects in the original classification. Intuitively, two 
objects are more similar if they are linked than if they are 
not This modification can be adapted for monotonic func- 
tions by including the definition of monotonicity to add the 
condition Linked(A,B) or .-4inked(A,Q. It can also be 
adapted to ratio similarity functions by redefining: 
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The linear definition for W works equally well on this 
modification as on the odginal method. AH of the methods 
described above can be applied to analyzing software sys- 
tems by treating modules as objea classes, declaration units to 
in modules as objects, deriving features from cross-refer- 
ences, and defining the link function Linked(A3>=l, if 
object A refers to object B, or if object B refeis to ot^ect A, 
and Linked(A3)=0 otherwise. 

A general example is now described for similarity func- 15 
tion modeling using a neural netwodc with one hidden layer. 
Hie networic accepts N inputs at each processing element, 
computes K intemiediate results in the hidden layer, and 
produces O outputs. Specifically, the iiqnits would be in the 
fonn of a sequence (ij of ones and zeroes rqiresenting the 20 
features of object A and object B. If there are N possible 
features, then the neural network contpates the function 
SimNN as follows: 



SimNNiAtB)'. 



f <H2jta( Ot 



2N 



25 



The advantage of using the neural network with a hidden 
layer is that the K intermediate sums can account for 
correlated features, whereas functions which are linear com- 
binations of their inputs do not account for correlations. 

Combining SIMNN with the ERR function defined above 
produces the following result 

In FIG. 6, a neoial network arrangement is shown for the 
implementation of bade propagation modelling of the mono- 
tonic similarity functions. The network has a raw input layer 
30 of the data tuples <S,03> of the training set, these three 40 
feature sets are input Earfi feature set may be represented as 
a vector of numbers, each number representing either the 
presence or the absence of that feature in the set. All N 
elements of each vector are coimected to the relevant set 
operation boxes in the next layer with no weights attached 45 
to the links, a network input layer 31 consisting of process- 
ing dements which perform the respective input set func- 
tions shown, each square box specifies a set (^ration, 
which is implemented for a pair of feature vectors. Hie 
output of each square box is a feature vector containing N 50 
elements. Each of the N elements of each output feature 
vector are linked to each of K copies of tiie circular node 
directiy above it, with a link weight of w^^ These link 
weights are the same for the output of each set operation 
box, a hidden layer 32 whidi performs die K intennediate 55 
sums adiq)tively to the coefficient weights w, contains the 
operator "sigma SUM**, wMch represents applying the sig- 
moid function to the sum of the inputs of the node. The 
outputs of layer 32 are single numbers. All K copies of a 
node in layer 32 transmit their outputs to the same node 60 
directly above them in layer 33. Each connection link has a 
different weight, vf2je Again, the same weights are used at 
all six links in this portion of the network, and a network 
output layer 33 for surmning error function values sums the 
inputs of the nodes and transmits the result to the next layer. 65 
A further layer 34 implements the niatching, monotonic 
function RATIO for each partial enor function output two 



liiiks are used fix)m die first node in layer 33 because tiiat 
value is used twice in die RATIO function, once with a 
coeflSdent of "1" and once with a coefficient of "wl". In diis 
layer each link weight appears twice, once in the left half and 
once in the right half. The final layer 35 computes the 
difference (comparison) value. Layer 35 is connected to 
layer 34 by links with weights of positive or negative "1". 
Summing these two inputs is equivalent to computing the 
diffCTence of two values of the RATIO function. Applying 
the sigmoid function to this difference results in the error 
function. Estimation of the optimal values of w is obtained 
by bade propagation through the neural network to minimize 
the average value of the error function ERR over the 
similarity measures data of the tndnirig set Note, for 
weights that appear more than once in the network, during 
backpropagation the error term assigned to a weight is the 
sum of the error terms assigned to it for each of the links to 
which it is attached. Hiis follows directiy from the math- 
ematical definition of badqiropagatioa 

The spedfic enlbodinrats of the invention described 
herem are intended to be illustrative only, and many oth» 
variations and moffifications may be made thereto in accor- 
dance with the principles of the inventioa All such onbodi- 
ments and variations and modifications diereof are consid- 
ered to be witiiin die scope of tiie xaveotion, as defined in the 
following claims. 

What is claimed is: 

L A corrq)uter-implemented method utilizing a neural 
network having a raw input layer, for fitting a model of 
similarity to a set of similarity judgments familiar to a 
human user for application in software tools for assisting 
said human user in paforming tasks requiring similarity 
judgments, whereby said tasks may include any of dasaifi- 
cation and dustering, c(Hnprising the steps of: 

(a) inputting a set of judgments, one at a time, into said 
raw input layer of said neural network, wherein each of 
set of judgn^ts comprises a triple of objects <S,G3>, 
where S is more similar to G than S is to B, and with 
each respective object being represented by a vector of 
features present in each said respective object; 

(b) couplmg ou^uts of said raw input layer of said neural 
network to respective inputs of an input layer of a 
duplicated neural network where said duplicated neural 
network comprises two identical copies of a simpler 
network, with fint and second sets of link weights 
being used for respective ones of said two identical 
copies, said first and second sets of link wdghts being 
identical, and with ii^ut couplings so arranged that one 
of said identical copies computes the similarity of S to 
B, and the other of said identical copies computes the 
similarity of S to G, said simpler netwo± comprising 
a desired functional form of said model of similarity; 

(c) coupling an output layer of said duplicated neural 
networic to a final output node which computes a lesidt 
indicative of the difference between two similarity 
values previously computed by said two identical cop- 
ies, ^iplies an activation function to said result, and 
compares a resulting value to a predetermined thresh- 
old to derive an error value; and 

(d) deriving optimal link weights for said model of 
similarity by backpropagating said enor value tfarongh 
said nemal netw(^ 

2. A computer-implemented method in accordance with 
claim 1, wherein said tasks requiring similarity judgments 
comprise tasks requiring similarity judgments software 
units. 

3. A computer-implemented method in accordance with 
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claim 2, wherein said desired functional foim of said model 
of similarity comprises a monotonic fonn. 

4. A computer-implemented method in acconiance with 
claim 2, wherein said desired functional fomi of said model 
of similarity comprises a monotonic and matching form. s 

5. A computer-implemented method in accordance with 
daim 2, wherein said desired functional fbim of said model 
of similarity comprises a form wherein a hidden layer of said 
neural network computes set union, intersection, and diffier- 
ence. lo 

6. A computer-implemented method in accordance with 
claim 2, wherein said desired functional form of said model 
of similarity comprises a form wherein a hidden layer of said 
neural networic uses linked copies of a feature weight 
siggregatiQn function to compute the significance of said IS 
each set of judgments. 

7. A computer-implemented method in accordance with 
claim 1, wherein the error value, ERR, is: 

where SIM^ a similarity function, is said desired functional 
form of said model of similarity using the coefKdencs w, and 
the threshold is typically 0.95, and 

25 

8. A computer-implemented m^od in accordance with 
claim 7, wherein said desired functional form of said model 
of similarity is monotonic, all attributes of objects A. B, C 30 
are treated as boolean, each object is represented equiva- 
lenily as a set of true attributes, sets a, b, c are the sets of 
attributes for objects A, B, C, respectively, that have trae 
values, and SIM^ is monotonic if: 

35 

whenever the following . set relationships hold: 
anb^nc, 
aroar-b, and 
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c-ob-a 

where "n" represents a set intersection, **u** represents a set 
union, represents set substraction (the set of values that 
are in set-a but not in set-b, represents ''includes", i.e.. 
set-a includes set-b or set-b is a subset of set-a and "3" 
represents "proper inclusion", i.e., set-a has at least one 
element that set-b does noL 

9. A computer-implemented method in accordance with 
claim 8, wherein said desired functional form of said model 
of similarity is restricted to those functions defined in the 
Tversky Model of human similarity judgement. 

10. A computer-implemented m^od in accordance with 
claim 9, wherein the following further fimctional form of the 
similarity function, SIM^ is used: 

where 

S(x,y,z)=contrast(x.y,z)=^x>-Wif(y)-W2f(z) 
where x, y, z are variables, fQ is a monotonic function, 
and w I and Wj are representations of weighting. 

11. A computer-implemented method in accordance with 
daim 9, wherein the following, further functional form of 
said similarity function, SIM^ is used: 

where 

S(x,y,2)=ratio (x,y,2K(x)/(f(x)+Wif(y)+W2f(z)) 
where x, y. z arc variables, fQ is a monotonic function, 
and Wj and W2 are representations of weighting. 

12. A computer-implemented method in accordance with 
daim 8, wherdn Linked (A, B) is a function whose value is 
0 or 1 for every pair of inputs, and whose value is known for 
every pair of objects in the original classification, and the 
restriction of the similarity function to monotonic functions 
indudes in a definition of monotonicity the condition Linked 
(A,B) V-Linkcd(A.Q. 

* * * * * 
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